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Abstract 

Metropolis algorithms for approximate sampling of probability mea¬ 
sures on infinite dimensional Hilbert spaces are considered and a gen¬ 
eralization of the preconditioned Crank-Nicolson (pCN) proposal is 
introduced. The new proposal is able to incorporate information on 
the measure of interest. A numerical simulation of a Bayesian inverse 
problem indicates that a Metropolis algorithm with such a proposal 
performs independently of the state space dimension and the variance 
of the observational noise. Moreover, a qualitative convergence result 
is provided by a comparison argument for spectral gaps. In particular, 
it is shown that the generalization inherits geometric convergence from 
the Metropolis algorithm with pCN proposal. 


Keywords: Markov chain Monte Carlo, Metropolis algorithm, spectral gap, 
conductance, Bayesian inverse problem. 


1 Introduction 

Consider a target probability distribution ^ defined on a possibly infinite 
dimensional separable Hilbert space %. It is of interest to sample from 
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this probability measure and assumed that there is a density of ^ w.r.t. a 
Gaussian reference measure fiQ on Ti given by 

^(u) = ^exp(-$(u)), uGH. (1) 

Here <h; ^ > M+ is a measurable function and Z = exp(—<I>(u)) ^o(du) 

the normalizing constant. Such probability measures ^ arise as posterior 
distributions in Bayesian inference with as a Gaussian prior. Common 
examples in infinite dimensional spaces are inferring spatially distributed 
properties of porous media or stock prices. 

Unfortunately, the fact that the normalizing constant Z is typically un¬ 
known and that is only available in the form of function evaluations 
makes it difficult to sample fx directly. But Markov chains and in particular 
Metropolis-Hastings (MH) algorithms are applicable for approximate sam¬ 
pling. These algorithms consist of a proposal and an acceptance/rejection 
step. A state is proposed by a proposal kernel but it is only accepted with 
a certain probability which depends on The authors of [1] suggested 
a modification of a Gaussian random walk proposal which is /Uo-reversible. 
The latter property leads to a well-defined MH algorithm in infinite dimen¬ 
sional Hilbert spaces, see also [30]. This proposal was later [4] referred to 
as preconditioned Crank-Nicolson (pCN) proposal. Remarkably, the Markov 
chain of the resulting pCN Metropolis algorithm has dimension-independent 
sampling efficiency, see [4],[13]. This is a significant advantage compared to 
earlier, popular MH algorithms whose performance usually deteriorates with 
increasing state space dimension [4],[13],[25]. 

We extend the pCN proposal to incorporate information about the tar¬ 
get measure fi. Such an adaption might account for the anisotropy of the 
covariance of p, or the local curvature of <1>. Intuitively, the resulting Markov 
chain has on average a larger step size and, thus, explores the state space 
faster. This idea is not entirely new. It is already mentioned in [29] where it 
is suggested to choose the covariance of the proposal adapted to the target 
measure. Later in [11] the authors explain how to propose new states using 
general local metric tensors. Moreover, in [22] the Hessian of the negative 
log density of ^ is employed as local curvature information to design a 
stochastic Newton MH method in finite dimensions and in [5],[17] a Gauss- 
Newton variant for capturing global curvature in an infinite dimensional 
setting is outlined. 

Our approach for adapting the proposal to the target measure p has a 
similiar motivation as the proposals considered in [5],[17]. It comes from 
a local linearization of the unknown-to-observable map in Bayesian inverse 
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problems. This suggests a particular form for approximating the covariance 
of the target measure, namely {C + r)“^, where C denotes the covariance 
of the reference measure and T is a suitable self-adjoint and positive 
operator. We then consider the class of Gaussian proposals with covari¬ 
ance Cr = {C + r)“^. By enforcing /iQ-reversibility we derive our class of 
generalized pCN (gpCN) proposal kernels Pp- 

In a numerical simulation the resulting Metropolis algorithm seems to 
perform independent of dimension and variance. Here variance indepen¬ 
dence refers to the variance of the observational noise, which affects the 
covariance of the target distribution /x. Particularly, if the variance of the 
noise decreases the measure /x becomes more concentrated. Our numerical 
experiments also indicate that other popular MH or random walk algorithms 
perform worse, i.e., variance dependent. 

Moreover, we present a convergence result for the gpCN Metropolis 
based on spectral gaps. It is well known, see [24], that for Markov chains 
with reversible transition kernels K a strictly positive spectral gap, denoted 
gap(ii') > 0, is equivalent to a form of geometric ergodicity. The latter 
roughly means that, in an appropriate setting, the distribution of the nth 
step of a Markov chain converges exponentially fast to its stationary mea¬ 
sure. We refer to Section 2.1 for precise definitions and further details. 

Our main theoretical result, stated in Theorem 20, is as follows. Let us 
assume that the transition kernel Mq of the pCN algorithm has a positive 
spectral gap, i.e. gap(Mo) > 0. Then, for any e > 0 there is an explicitly 
given probability measure fiR such that 

and gap(Mr,n) > 0 

where denotes the transition kernel of the gpCN Metropolis algorithm 
targeting the measure hr and IHIt.y is the total variation distance, see (3). 

The key for the proof is a new comparison theorem for spectral gaps of 
Markov chains generated by MH algorithms. In order to apply this com¬ 
parison argument we show that the proposal kernels of the pCN and gpCN 
Metropolis are equivalent and that their Radon-Nikodym derivative belongs 
to an Lp-space for a p > 1. We note that in [13] under additional assump¬ 
tions on the density function it is proven that there exists a strictly 
positive spectral gap of the pCN Metropolis. Thus, in this setting the gpCN 
Metropolis algorithm targeting hr also converges exponentially. 

The remainder of the paper is organized as follows. In Section 2 we 
state the precise framework, recall preliminary facts, and give a brief in¬ 
troduction to Markov chain Monte Carlo and MH algorithms including the 
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pCN Metropolis algorithm. The gpCN Metropolis algorithm is motivated 
and defined in Section 3. Particularly, in Section 3.3 we illustrate its su¬ 
perior performance compared to other popular MH algorithms. In Section 
4 we state a general result for comparing spectral gaps of MH algorithms 
and then apply it to the gpCN and pCN Metropolis. Section 5 provides an 
outlook to gpCN algorithms in infinite dimensions which use Gaussian pro¬ 
posals with state-dependent covariance. For the convenience of the reader 
we recall some facts about Gaussian measures in Appendix A and relegate 
more technical proofs to Appendix B. 

2 Preliminaries 

Let ^ be a separable Hilbert space with inner-product and norm denoted 
by (•,•) and || • ||. By BiT-L) we denote the corresponding Borel ci-algebra 
and by C{'H) the set of all bounded, linear operators A: T-L ^ T-L. Further, 
we have a Gaussian measure = N{0,C) on {'H,B{'H)). Here and in 
the remainder of the paper C: % ^ % denotes a nonsingular covariance 
operator on Ti, i.e., a bounded, self-adjoint and positive trace class operator 
with kerC = {0}. By ja we denote the probability measure of interest on 
{'H,B{'H)) given through the density defined in (1). Typically, the desired 
distribution is complicated and the density only known up to a constant, 
which makes direct sampling from /x difficult. This is the reason why Markov 
chains are used for approximate sampling according to pL. 

2.1 Markov chains and spectral gaps 

We give a short introduction to Markov chains and Markov chain Monte 
Garlo (MGMG) methods on general state spaces. We call a mapping K: T-Lx 
B{'H) —)■ [0,1] a transition kernel, if K{x,-) is a probability measure on 
{'H,B{'H)) for each x G Ti and K{-,A) is a measurable function for each 
A £ B{'H). Then, a Markov chain with transition kernel AT is a sequence of 
random variables mapping from some probability space (HjA", P) 

to {'H,B{'H)), satisfying 

P(A„+1 £A\Xu...,Xn)= F{Xn+l £A\Xn) = K{Xn, A) 


almost surely for all A £ Most properties of a Markov chain can be 

expressed as properties of its transition kernel. For example, we say the 
transition kernel K is fi-reversihle if 

Ar(u, du)/i(du) = iL(u, du)/x(du) (2) 
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in the sense of measures onT-LxT-L. This property is also known as the detailed 
balance eondition and it implies that the distribution is a stationary or 
invariant probability measure of a Markov chain with transition kernel K, 
i.e., if Xi ^ ij, then also X 2 ~ n- 

Each ;U-reversible transition kernel K on induces a Markov 

operator, which we shall also denote by K, given by 


where 


^2(^) 


Kf{u)= [ f{v)K{u,dv), f£L 2 {p), 

dn 



is the Hilbert space of measurable, square integrable functions with respect 
to /U. By the //-reversibility we have that K: L 2 {p,) —)• T 2 (m) is a bounded 
and self-adjoint linear operator. We also introduce the closed subspace 


L'iiB) 


f e L 2 {b) I / f{u)fi{du) = 0 


m 


of L 2 {p) and the operator norm 


K 




sup 


WKfh, 

Wfh, 


for K : L 2 {fi) —)• Let spec(iL | denote the spectrum of K on 

Then, we also have 

\\K\\^ = sup{|A| : A G spec(iL | L^(//))}. 

We define the spectral gap of K (w.r.t. p,) by gap(iL) = 1 — ||Li"||^. This is 
an important quantity which can be used to formulate conditions ensuring 
an exponentially fast convergence of the distribution of X^ to p. To be 
more precise, we introduce the total variation distance of two probability 
measures 121,122 on by 


\\l2l - 122\\tv ■= sup \i2i{A) - 122{A)\ . (3) 

AeB{n) 

Let 12 be the initial distribution of our Markov chain, i.e., Xi ~ 12 . Then, 
with 

K^{u,A)= [ K^-\v,A)Kiu,dv), AeB{n), 

Jn 
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for n G N, the distribution of Xn+i is given by 


uK^{A)= [ K^{u,A)u{du). 

Jn 


In the setting above it is well known, see [24, Proposition 2.2], that 
11 X 11 ^ < 1 , or equivalently gap(iP) > 0 , holds, iff the transition kernel is 
L 2 (;u)-geometrically ergodic. Here by L 2 (;u)-geometric ergodicity we mean 
that, there exists a number r G [ 0 , 1 ) such that for any probability measure 
u, which has a density ^ G L 2 {n) w.r.t ^u, there is a constant Cy < oo such 
that 

\\yK^ - ^x\\^^<Cyr^, n G N. 


If the distribution of converges to /U, then the Markov chain can 

be used for approximate sampling from //. This leads to Markov chain Monte 
Carlo methods for the computation of expectations. The mean E^(/) of a 
function /: ^ M w.r.t ^ can then be approximated by the time average 


Sn,nQ if) 


n 

- fi^j+np ) 


where n is the sample size and no a burn-in parameter to decrease the influ¬ 
ence of the initial distribution. The spectral gap of K of the Markov chain 
{Xn)n&n can then be applied to assess the error of the time average Sn,noif)- 
We assume gap(iP) > 0 and mention two results. The first is rather clas¬ 
sical and due to Kipnis and Varadhan [16]. If the initial distribution is /x 
and / G T 2 (/u), then the error \/n{Sn,no{f) — converges weakly to 

A^(0, with 

alj, = ((/ + K){I- K)-\f - E^(/)), (/ - E^(/)))^ < 


where (•, •)^ denotes the inner-product in L 2 {^). The second result is more 
recent and provides a non-asymptotic bound for the mean square error. We 
have 


sup E|5n,no(/) -IE^(/)I^ < 
Il/ll4<l 


2 Cy\\Kf;> 

n ■ gap(iP) • gap{K)‘^ 


with II/II 4 = \ f{u)\‘^ fj,{du)j and a number > 0 depending on the 

initial distribution 12 . We refer to [26] for details. 

This shows that gap(iP) is a crucial quantity in the study of Markov 
chains and the numerical analysis of MCMC methods. 
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2.2 Metropolis algorithm with pCN proposal 

In this work we focus on Markov chains generated by the Metropolis algo¬ 
rithm. This algorithm employs a transition kernel on (T-L,B{'H)) for propos¬ 
ing new states which we shall denote by P and call proposal kernel. More¬ 
over, let a: TixTl ^ [0,1] be a measurable function denoting the acceptanee 
probability. Then, a transition of a Markov chain (Xn)neN generated by the 
Metropolis algorithm can be represented in algorithmic form: 

1. Given the current state = u, draw independently a sample v of 
a random variable V ~ P{u, ■) and a sample a of a random variable 
A~Unif[0,1]. 

2. If a < a{u,v), then set Xn+i = u, otherwise set Xn+i = u. 

The transition kernel of such a Markov chain is then 

M{u,dv) = a{u,v)P{u,dv) + Su{dv) / {I — a{u,w)) P{u,dw) (4) 

Jn 

and we call it Metropolis kernel. It is well known, see [30], that M is re¬ 
versible w.r.t. /i if a{-, •) is chosen as 


a{u, v) = min 


1 , 



u,v eTi, 


(5) 


where denotes the Radon-Nikodym derivative of the measures 

r]{du,dv) := P{u,dv) fi{du) and ri'^{du,dv) := P{v,du) p{dv), 


which we assume to exist. For finite dimensional state spaces the condition 
of absolute continuity of w.r.t. rj is often easily satisfied. However, 
for infinite dimensional state spaces this becomes a real issue, since there 
measures tend to be mutually singular. As pointed out in [1],[4] a possible 
way to ensure the existence of is to choose a proposal kernel P which 
is ^o-reversible, i.e.. 


P{u, du) /ro(du) = P{v, dtt) //o(du). (6) 

Then, due to the fact that ^ and ^ exist, see (1), it follows that 

^ ^ exp(4>(u) - 4>(u)) (7) 
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and, hence, a{u,v) = min{1, exp(d>(it) — <h(ti))}. 

We next introduce the Metropolis algorithm with the preconditioned 
Crank-Nieolson (pCN) proposal, see also [4] for details. The pCN proposal 
kernel arises from a discretization of an Ornstein-Uhlenbeck process with 
invariant measure /xq and takes the form 

Po{u,-) = N{Vl- s^u,s‘^C) (8) 


where s G [0,1] denotes a variance or stepsize parameter. It is straight¬ 
forward to verify that Pq is /xo-reversible. Namely, by applying (34) from 
Appendix A we deduce 


Po{u, dv) po{du) = N 


0 

0 


C 


^/Y^c 

c 


Po{v,du) /Uo(du). 


In the following we call the resulting Metropolis algorithm with proposal 
Pq simply pCN Metropolis algorithm or pCN Metropolis and denote its 
Metropolis kernel by Mq. 

Next, we generalize the pCN Metropolis algorithm to allow for proposal 
kernels which employ a different covariance structure than the covariance of 
Ato- 


3 Metropolis with gpCN proposals 

In recent years many authors have proposed and pursued the idea to con¬ 
struct proposals which try to exploit certain geometrical features of the 
target measure, see for example [II],[22],[17],[5]. 

We consider generalized pCN (gpCN) proposals which aim to adapt to 
the covariance structure of the target measure ji. We motivate our gpCN 
proposal, show that it is well-defined in function spaces and illustrate its 
superior performance in a simple but common setting. 

3.1 Motivation from Bayesian inference 

We briefly recall the Bayesian framework for inverse problems and refer to 
[9] for an overview and to [28] for a comprehensive introduction to the topic. 

Assume A is a random variable on with distribution fiQ = 

N{0,C). Here is called the prior distribution and describes our initial 
uncertainty about X. Let T be a random variable on MP given by 


Y = G{X) + e 


( 9 ) 



with a continuous map G-.H ^ and s ~ A^(0, S), independent of X, 
with S G The variable Y models an observable quantity depending 

on X via the map G which is perturbed by additive noise e. Then, given 
some observation y G MX of Y we want to infer X, i.e., we are interested 
in the conditional distribution of X given the event Y = y. We denote this 
conditional distribution by y and call it posterior distribution. In particular, 
in this setting y, admits a representation of the form (1) with 

^{u) = \\y-G{u)\%-i ( 10 ) 

where for x G M™. 

A special situation appears if G{u) = Lu + b with a linear mapping 
L-.T-L ^ and b G Then, it is known from [21] that y = N{m,C) 
with 

m = GL*{LGL* + J:)-\y-b), d = {G-^ + L*Y-^L)-\ (11) 

where L* denotes the adjoint operator of L. If we want to sample ap¬ 
proximately from a Gaussian target measure y = N(m, G) by Metropolis 
algorithms with Gaussian proposals, it seems beneficial to employ as 
proposal covariance, see for example [29],[25],[17]. Intuitively, since then the 
Gaussian proposal possesses the same principal directions and the same ra¬ 
tio of variances as the Gaussian target measure, the proposed states should 
be accepted more often than for other proposals. See also Figure 1 for an 
illustration. This leads to a higher average acceptance probability and, thus, 
a faster exploration of the state space. 

The affine case indicates how we can construct good Gaussian proposal 
kernels if the map G is nonlinear but smooth. For a fixed uq £ Ti local 
linerization leads to 

G{u) = G{uo) + VG(tto) {u - uo) + r{u) 

with a remainder term r(u) G For a sufficiently smooth G the remainder 
r is small (in a neighborhood of uq), so that 

G{u) = G{uo) + XG{uq) {u - Uo) 

is close to G{u) (in a neighborhood of uq)- The substitution of G by G in the 
model (9) leads to a Gaussian target measure y = N{m, G) with covariance 

G = (G-i+ L*S-iL)-\ L = XG{uo). 
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(a) 


(b) 



Figure 1: For a Gaussian target measure ^ = N{m,C) and current state 
u the region of acceptance {v : a{u, v) = 1} (dark grey region) as well as 
two regions of possible rejection {v : p < a{u, v) < p < 1} (lighter grey 
regions) are displayed. Moreover, we present the contour lines (blue and 
red, resp.) of Gaussian proposals N{u,s^C) with covariance C = / in part 
(a) and target covariance C = C in part (b). 

By the fact that G and G are close, we also have that the measures p 
and /i are close as well. Then, it is reasonable to use C in the covariance 
operator of the proposal in a Metropolis algorithm. Of course, there might be 
other choices besides a simple linearization of G at one point. For example, 
averaging linearizations at several points ui,... ,Un G V. leads to 




n=l 


Natural candidates for the points ui,... ,un are samples according to the 
prior or samples taken from a short run of a preliminary Markov chain 
with the posterior as stationary measure, cf. the adaptive method in [5, 
Section 3.4]. One could also think of a state-dependent covariance G{u). 
This motivates the study of proposals which use covariances of the form 
Gr = {G~^ -|- r)“^ for suitably chosen operators F. 

3.2 Well-defined gpCN proposals 

In this section we introduce the gpGN proposal kernel and prove that the 
Metropolis algorithm with this proposal is well-defined in the sense that it 
leads to a /U-reversible transition kernel. 

For this we introduce the set £+(H) of all bounded, self-adjoint and 
positive linear operators F ^ We define the operators 


Cr := (c-i + r)-\ Tec+in) 


( 12 ) 
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motivated in Section 3.1, where C denotes the covariance operator of the 
prior measure /Uq = -^(0, C), for which we also use the equivalent represen¬ 
tation 

In the following we prove that Cp can be considered as covariance operator. 

Proposition 1. Let C be a nonsingular covariance operator on L G 
and Cp with Hy given as in (13). Then Hy G £+(^) is trace class 
and Cy is also a nonsingular covariance operator on %. 

Proof. That iLp G £+(^) follows by construction. Furthermore, since iLp is 
a composition of two Hilbert-Schmidt and one bounded operator, and 
T, respectively, it is trace class [6, Proposition 1.1.2]. Since iLp is selfadjoint 
and compact, we have from Fredholm operator theory that the operator 
I + Hy is invertible iff kerFTp = {0}. The latter is the case since iLp is 
positive which implies {{I + Hy)u,u) > {u,u). Hence, the inverse {I + Hy)~^ 
exists and, moreover, (/ -|- Hy)~^ G C+{H) with ||(/ -|- iLp)“^|| < 1. The 
self-adjointness and positivity of Cp follows immediately and since Cp is a 
composition of two nonsingular Hilbert-Schmidt operators and a nonsingular 
bounded operator, and (/ -|- H)~^, respectively, it is trace class and 

nonsingular as well. □ 

By Proposition 1 we can use the covariance operator Cp for constructing 
proposal kernels. Specifically, we consider 

P{u,-) = N{Au,s‘^Cy), s G [0,1), F G £+(^), (14) 

where A: H ^ H denotes a suitably chosen bounded linear operator on H. 
Here A should be chosen such that P is /ro-reversible, which means that 
a Metropolis kernel with proposal P is /x-reversible, see Section 2.2. By 
applying (34) we obtain in this setting 

/fn! fC CA* 

P(«. d») w(d«) = «([„]. [^c AC A- + ,^CrJ) 

and 

A ^ (A ^ \ACA* + s^Cy HCl\ 

P{v,du) fio{dv) = N I p ^ I. 

Thus, for satisfying (6) we need to choose A so that 

AC = CA*, AC A* + s^Cp = C. (15) 
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By straightforward calculation we obtain as the formal solution to (15) 

A = Ar = {I + (16) 

The following lemma ensures that this choice of A yields a well-defined 
bounded linear operator on Ti. 

Lemma 2. Let the assumptions of Proposition 1 be satisfied and let s G 
[0,1). Then (16) defines a bounded linear operator A-p : ImC^/^ —?■ Ti. 

The well-definedness of Ap : ImC^/^ — )■ H follows rather easily whereas 
its boundedness is not trivial. Namely, one easily can construct a bounded 
B G C{'H) such that is unbounded on Since the proof 

of Lemma 2 is rather technical, it is postponed to Appendix B.l. 

Lemma 2 allows us now to extend Ap to B by continuation, because the 
Cameron-Martin space ImC^/^ is a dense subspace of B. For simplicity we 
denote this continuous extension again by Ap : B ^ B. 

Definition 3 (gpCN proposal). For s G [0,1) and F G C^{B) the general¬ 
ized pCN proposal kernel is given by 

Pp{u,-) := N{Apu,s‘^Cp). (17) 

For the zero operator F = 0 we recover the pCN proposal. By Lemma 2 
and the arguments given in Section 2.2 we obtain the following important 
result. 

Corollary 4. Let hq = N{0, C) and /x be given by (1). Let the assumptions 
of Lemma 2 be satisfied. Then, a gpCN proposal kernel Pp given by (17) 
and an acceptance probability a{u,v) = min {1, exp(<i>(u) — <h(u))} induce 
a //-reversible Metropolis kernel denoted by Mp. 

For simplicity we also call the Metropolis algorithm with transition kernel 
Mp just gpCN Metropolis. There are connections of the gpCN Metropolis 
to other recently developed Metropolis algorithms for general Hilbert spaces 
which also use more sophisticated choices for the proposal than the pCN 
proposal. The following two remarks address these connections. 

Remark 5. The gpCN proposals form a subclass of the operator weighted 
proposals introduced in [5],[17]. The particular form of the gpCN proposal 
allows us to derive properties such as boundedness of the “proposal mean 
operator” Ap and the convergence of the resulting Markov chain, see Section 
4. These issues were left open in [5],[17]. 
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Remark 6. In [23] the authors compute a Gaussian measure = N{m^, C^) 
which comes closest to fi w.r.t. the Kullback-Leibler distance. The admis¬ 
sible class of Gaussian measures considered there is closely related to our 
parametrized proposal covariances Cr, although their class of Gaussian mea¬ 
sures is slightly larger. The measure is then used to construct a proposal 
kernel •) = N{m^ -|- y/l — s^{u — m*), for Metropolis algorithms. 

Note that P* is not /xo-reversible but -reversible, since it is a pGN proposal 
given the prior //*. In order to obtain a //-reversible Metropolis kernel the 
authors need to adapt the acceptance probability by including terms of 
cf. Section 5. Thus, the authors of [23] also use a different covariance op¬ 
erator than the prior covariance in a pCN proposal in order to increase the 
efficiency of the Metropolis algorithm. The difference to our approach is the 
way they ensure the //-reversibility of the algorithm. They keep the mean of 
the original pCN proposal and modify the acceptance probability whereas 
we modify also the mean of the proposal to maintain its //Q-reversibility and, 
therefore, can leave the acceptance probability unchanged. 

3.3 Numerical illustrations 

We illustrate the gpGN Metropolis algorithm for approximating samples of 
a posterior distribution in Bayesian inference. In particular, we compare 
different Metropolis algorithms and investigate which of these perform inde¬ 
pendently of the state space dimension and of the variance of the involved 
noise. 

We consider the same setting and inference problem as in [23, Section 
6.1]. Assume noisy observations yj = p(0.2j) -|- £j with j = 1 ,... ,4, of the 
solution p of 


^ = 0 ’ ( 18 ) 

on D = [0,1] are given and we want to infer u. Here the £j are independent 
realizations of the normal distribution We place a Gaussian prior 

^"(0, A“^) with A = ^ on the completion Tic of Hq (D) (1 H'^(D) in L?‘{D). 
Recall that (H, P) is a probability space and let ?7; H —)■ T-Lc C L‘^{D) be 
a random function with distribution N{0, A“^). This allows us to represent 
the random function U as 

U{u}){x) = —^Cfc(cj)sin(/c7rrc), ~ A(0, (19) 

k=l 
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P-a.s. where all random variables are independent. Thus, inference on u 
is equivalent to inference on ^ = {^k)ken- This leads to the prior //q for ^ on 
V. ;= given by hq = N{0, C) with C = diag{A:“^ ; k G N}. Further, we 
denote by fi the resulting conditional distribution of ^ given the observed 
data yi,..., 2/4. The measure fj, is given by a density of the form (1) with 
as in (10) where S = and G(^) is the mapping 

^ ^ ^ ^ (p(0.2j, 

We test the performance of ^-reversible Metropolis algorithms for com¬ 
puting expectations w.r.t. of a function /: —)■ M. We consider 

four Metropolis algorithms denoted by RW, pCN, GN-RW and gpCN with 
different proposal kernels: 

• RW: Gaussian random walk proposal Pi{$, •) = s^G), 

• pCN: pCN proposal P 2 {$,-) = s^G), 

• GN-RW: Gauss-Newton random walk proposal Psi^, •) = s^Gr), 

• gpCN: gpGN proposal P4(^, •) = N{Ar^, s'^Cr)- 
Here we choose T = a~‘^LL^ with L = VG(^jviap) and 

^MAP = argmin \y - G(^)|^ -k ||G“^G^||2'\ ^ 

5eimCi/2 ^ ^ 


The solution of (18) is given by p{x) = 2Sx{e~^)/Si{e~'^) with Sx{f) = 
Iq f{y)dy and, thus, the gradient VG(^) can be easily computed by differ¬ 
entiating the explicit formula for p w.r.t. Furthermore, we apply the 
Levenberg-Marquardt algorithm to solve the above optimization problem 
for the MAP estimator ^map- For all Metropolis algorithms we tune s such 
that the average acceptance rate is about 0.25^. As a metric for comparison 
we consider and estimate the effective sample size 


-I -1 


ESS = ESS(n, /, (^fc)fceN) = n 


l + 2^7/(fc) 

k>0 


^In general elliptic PDEs can be solved in a weak sense by variational methods. Then 
adjoint methods known from PDE constrained optimization and parameter identification 
can be employed to compute VG(^), see [31, Chapter 6] for details. 

^The empirical performance of each algorithm was best for this particular tuning. 
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Here n is the number of samples taken from a Markov chain {$j^)k£N with, 
say, a Metropolis transition kernel M and 7 j denotes the autocorrelation 
function 7 /(/c) = Corr(/(^„J, f{$rio+k)) a quantity of interest /. 

The value of ESS corresponds to the number of independent samples 
w.r.t. /i which would approximately yield the same mean squared error as 
the MCMC estimator Sn^noif) for computing IE^(/). This can be justified 
under the assumption that ~ //, since then by virtue of [26, Proposi¬ 
tion 3.26] we have 


k>0 


where Uj denotes the asymptotic variance of the estimator Sn,no if) as in 
Section 2.1. 

For numerical simulations we use an uniform discretization of [0,1] with 
Ax = 2“® and apply the trapezoidal rule for evaluating integrals w.r.t. dx. 
Furthermore, we truncate the expansion (19) after N terms where we vary 
N in order to test the Metropolis algorithms for dimension independent per¬ 
formance. The noise-free observations are generated by rt(x) = 2sin(27rx). 
We also consider different noise levels to examine the effect of smaller 
variances a'^, leading to more concentrated posterior distributions //, to the 
performance of the Metropolis algorithms. In all cases we take no = 10^ as 
burn-in length and n = 10® as sample size. We use /(^) ;= /q^ dx as 

the quantity of interest^. To estimate the ESS we use the initial monotone 
sequence estimators^, for details we refer to [10, Section 3.3]. 

The results of the simulations are illustrated in Figure 2 and Figure 3. 
The former displays the estimated autocorrelation functions 7 j resulting 
from the four Metropolis algorithms for N = 50 and = 0.1 in (a), for 
N = 50 and = 0.01 in (b), for N = 400 and Ug = 0.1 in (c) and for 
N = 400 and as = 0.01 in (d). In Figure 3 we display the estimated ESS 
for varying as = 0.1,0.05,0.025,0.01 with fixed N = 100 in (a) and varying 
N = 50,100, 200,400, 800 with fixed as = 0.1 in (b). 

We see in both figures that the performance of pCN and gpCN is indepen¬ 
dent of the dimension and only GN-RW and gpCN perform robustly w.r.t. 

^We also studied other functions such as /(^) = /(^) — maxi, and /(^) = 

p(0.5,^) but the results of the comparison were essentially the same. 

^We also estimated the ESS by batch means (100 batches of size 10'*) to control our 
simulations. This lead to similar results. 
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the noise variance. Thus, the gpCN Metropolis seems to be the only algo¬ 
rithm with both desirable properties. Intuitively, the variance independent 
performance might come from the fact that our choice of Cp incorporates 
the noise covariance a‘^1 in a way as the posterior covariance might depend 
on. Thus, the smaller becomes, i.e., the more pronounced the change 
from prior to posterior is, the more pronounced is also the adaptation in 
the proposal covariance by Cp = {C~^ + Moreover, the gpCN 

performs best among the four algorithms also in absolute terms of the ESS. 



lag 


(b) 



(c) 


(d) 




Figure 2: Autocorrelation of / given samples generated by the four Metropo¬ 
lis algorithms denoted by RW, pCN, GN-RW and gpCN for: (a) state di¬ 
mension N = 50 and noise standard deviation Ue = 0.1; (b) N = 50 and 
CTg = 0.01; (c) N = 400 and = 0.1; (d) N = 400 and = 0.01. 
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(a) (b) 



Figure 3: Dependence of empirical ESS for each Metropolis algorithm RW, 
pCN, GN-RW and gpCN w.r.t.; (a) noise variance with fixed state dimension 
N = 100; (b) state dimension with fixed noise variance = 0.01. 

4 Qualitative comparison of gpCN Metropolis 

In this section we develop qualitative comparison arguments for Metropolis 
algorithms in a general setting and apply those results to the gpCN Metropo¬ 
lis algorithms. In particular, we relate the existence of a spectral gap for the 
gpCN to the existence of a spectral gap of the pCN Metropolis. Here it is 
worth mentioning that in [13] sufficient conditions for the latter were proven 
under additional regularity assumptions on the function <1> in (1). With our 
approach we do not need to rely on those conditions and will benefit from 
any improvement of the results stated in [13]. 

We start with stating a general comparison result for the spectral gaps of 
Metropolis algorithms with equivalent proposals. We then verify the corre¬ 
sponding assumptions for the gpCN Metropolis: positivity and equivalence 
to the pCN proposal. In order to derive our main theorem, we consider in 
Section 4.4 restrictions of the target measure fi to arbitrary R-balls in Ti 
and prove convergence of the gpCN Metropolis to these restricted measures. 

4.1 Comparison of spectral gaps 

Let iL be a //-reversible transition kernel on i.e., the associated 

Markov operator K: L 2 {p) —)• L 2 (/i) is self-adjoint. Let the largest element 
of the spectrum spec(Rr j L^ip)) be given by 

A(iL) ;= sup{A: A G spec(iL ] L'^{p))} 
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and define the conductance of K (w.r.t. //) by 


M(A)e(o,i/ 2 ] fi{A) 

Under the assumptions above the Cheeger inequality for Markov operators, 
see [18], given by 

^^<l-A{K)<2ipiK) ( 20 ) 

provides a useful relation between A{K) and the conductance ^{K). 

Let us assume that Mi and M 2 are ;U-reversible transition kernels of 
Metropolis algorithms with the same acceptance probability a and proposals 
Pi and P 2 , respectively. Then, we obtain the following result. 

Lemma 7. Let /r be a probability measure on and for i = 1,2 

let 

Mi{u,dv) = a{u,v)Pi{u,dv) + duidv) / (1 — a{u,w)) Pi{u,dw) 

Jn 

be Metropolis kernels. Assume that for any u £ Ti the Radon-Nikodym 
derivative of Pi(rt, du) w.r.t. P 2 {u, du) exists, i.e., the proposal kernels admit 
a density 

, , dPi (u).. 

p{u,v) = u,ven. 

dP2{u) 

If for a number p > 1 we have 

Kp := sup --T-- < 00 , (21) 

/i(A)e(0,l/2] 


</^(Mi) < 

Proof. Let A G B{T-L) with n{A) G (0,1/2]. Further, let q = p/{p — 1) such 
that 1/q + 1/p = 1. Then 

/ Mi{u, A'^) dp{u) = / / 1a<^{v) 1a{u) a{u,v) Pi{u,dv) diJ,{u) 

Ja Jn Jn 

= lA<^iv)lA{u)a{u,v) p{u;v) P 2 {u,dv)dn{u). 

Jn Jn 
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Note that P 2 {u, di;)/u(dii) is a probability measure on {T-L x Ti, B{T-L x T-L)) and 
we can apply Holder’s inequality according to this measure with parameters 
p and q. Thus, by using a{u,v) = a(u, we obtain 


Mi{u, A^) d//(u) 


<(^j M 2 {u,A^)dli{u^ J p{u,vYa{u,v) P 2 {u,dv)dp{u) 

< M 2 {u,A'')dp{u^ J p{u, vY P 2 iu, dv) diJ,{t 


lip 


Dividing by p{A), applying p{A) ^ = p{A) p{A) and taking the 
infimum yields 

p{Mi) < ip{M2Y/^Kyp. 


□ 


Employing comparison inequalities in terms of the conductance is not an 
entirely new idea, see for example [19, Proof of Theorem 4]. There the au¬ 
thors obtained a conductance inequality for transition kernels with bounded 
Radon-Nikodym derivatives w.r.t. each other. An immediate consequence 
of Lemma 7 and (20) is the following theorem. 

Theorem 8 (Spectral gap comparison). Let the assumptions of Lemma 7 
be satisfied and let the Markov operators associated with Mi and M 2 be 
positive and self-adjoint on L 2 {p). Then 

We apply Theorem 8 to prove our convergence result for the gpCN 
Metropolis. We therefore verify in the following section the condition that 
the corresponding Markov operator is positive. 


4.2 Positivity of Metropolis with Gaussian proposals 

Recall that (/, 5 )^ = fgdp denotes the inner-product of L 2 {p) and that 
a Markov operator K: L 2 {p) —)• L 2 {p) is positive if {Kf,f)^ > 0 for all 
/ G L2{p)- 
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Lemma 9 (Positivity of proposals). Let fj,o = N{0,C) be a Gaussian mea¬ 
sure on a separable Hilbert space Ti and let P{u,-) = N{Au,Q) be a fiQ- 
reversible proposal kernel with a bounded, linear operator A : T-L ^ T-L. If 
there exists a bounded, linear operator B : T-L ^ Ti such that 

B^ = A, BC = CB*, 

and D := C — BCB* is positive and trace class, then, the Markov operator 
associated with the proposal P is positive on L 2 {no). 

Proof. Because of the assumptions on B and D we obtain that the proposal 
kernel Pi{u, •) = N{Bu,D) is well-defined. Further, since BCB* + D = C 
we derive 

Pi{u,dv)fio{du) = N 

which leads by BC = CB* to the ;Uo-reversibility of Pi and, thus, to the 
self-adjointness of its associated Markov operator in L 2 (/io)- It remains to 
prove that P^ = P holds for the associated Markov operators which then 
immediately yields the assertion. The equality of the Markov operators is 
equivalent to the equality of the measures Pf (u, •) and P(u, •) for all u G P. 
In order to show that Pf (u, •) = P{u, •) for all u G P, we take (^n)neN to 
be an i.i.d. sequence with .^i ~ A^(0, D) and construct an auxiliary Markov 
chain by 

^n+l — P^n ^ P 1; 

where Xi = u for an arbitrary u G P. The transition kernel of the chain 
is the kernel Pi. In particular, for G G P(P) holds ^[^3 G G] = 
p2(u,G). By 

X3 = BX2 + 6 = + P 6 + 6 

and P^i -|-.^2 ~ -^(0) BDB* + D) we obtain X 3 ~ N(B‘^u, BDB* +D). Due 
to the assumptions we have B^ = A and 

BDB* + D = B{C - BCB*)B* + C - BCB* = C- AC A*. 

The last step C — AC A* = Q follows by the assumed //o-reversibility of P, 
because we know from Section 3.2 that P being //Q-reversible is equivalent 
to A and Q satisfying AC = CA* and AC A* + Q = C. We thus arrive at 
X 3 ~ N{Au, Q) which proves Pi (u, •) = P{u, •). □ 

The next lemma extends the previous result to Markov operators asso¬ 
ciated with Metropolis algorithms. The proof follows by the same line of 
arguments as developed in [27, Section 3.4] and is therefore omitted. 


C CB* 
BC C 
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Lemma 10 (Positivity of Metropolis kernels). Let /U be a measure on % 
given by (1) and let P be a /Uo-reversible proposal kernel whose associated 
Markov operator is positive on L 2 (/Uo). Then the Markov operator associated 
with a ;U-reversible Metropolis kernel 

M{u,dv) = a{u,v)P{u,Av) + 5u{'A.v) / {1 — a{u,w))P{u,dw) 

Jn 

with a{u,v) = min{l, is positive on L2(h)- 

The previous two lemmas lead to the following result about the gpCN 
Metropolis. 

Theorem 11 (Positivity of gpCN Metropolis). Let /jq = N{0,C) and /x 
as in (1) and let Mr denote the gpCN Metropolis kernel as in Corollary 
4. Then the associated Markov operator M-p is self-adjoint and positive on 
L2{h). 

Proof. It is enough to verify the assumptions of Lemma 9 for the gpCN 
proposal. Recall that Pr{u,-) = N{Aru, s'^Cp) which is ^o-reversible by 
construction with bounded — s^{I + Hp)~^C~^f‘^. By choos¬ 

ing 

B := - s^{I + Hp)-^C-^/‘^, 

we obtain = Ap and BC = CB*. Moreover, 

D = C- BCB* = I - s‘^{I + Hp)-^)C^/‘^. 

The eigenvalues of / — — s^{I + Hp)~^ take the form 1 — ^1 — > 0 

with A > 0 being an eigenvalue of Hp. Thus, I — \/1 — s‘^{I + Hp)~^ is 
positive and bounded which yields D being positive and trace class since D 
is then a product of two Hilbert-Schmidt and one bounded operator. Thus, 
the conditions of Lemma 9 are satisfied and the assertion follows. □ 

4.3 Density between pCN and gpCN proposal 

In this section we show that for any state u G Ti. the gpCN proposal is 
equivalent to the pCN proposal in the sense of measures. Moreover, we will 
also derive an integrability result for the corresponding density. For proving 
the equivalence we need the following technical result. 
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Lemma 12. Let the assumptions of Corollary 4 be satisfied and define the 
bounded, linear operator Ap 


Ar := Ao- Ar = Vl- s^I - yjI - {I + (22) 

Then ImAp T i.e., C^^/^Ar is a bounded operator on Ti. 


The proof of this lemma can be found in Appendix B.2. It is similar 
to the proof of Lemma 2 and again rather technical. However, Lemma 12 
ensures that we can apply the Cameron-Martin theorem. Theorem 21 in 
Appendix A, in the proof of the following result. The other main tool for 
deriving the next theorem is a variant of the Feldman-Hajek theorem as 
stated in Theorem 22 in Appendix A. 


Theorem 13 (Density of pCN w.r.t. gpCN). With the notation and as¬ 
sumptions of Corollary 4 holds the following. 

1. The measures /Uq = -^(0, C) and fir = A^(0, Cr) are equivalent with 


dfio, . exp(i(ru,u)) 

d/ur y^det{I + Hr) 


(23) 


2. For u gT-L the measures Po{u,-) and Pr{u,-) are equivalent with 

^^|^(u) = TTcM^Artt, -^(^^ - ^rit)) (24) 

where Ap as in (22) and 

T^cuih, v) := exp -k {C~^h, . (25) 

(The subscript in ttcm indicates the Cameron-Martin formula.) 

Proof. We prove (23) by verifying the assumptions of Theorem 22 from 
Appendix A. We observe 

/ - = /-(/ + Hr)~^ 

and set Tr :=/—(/ -|- The eigenvalues (tn)neN of the self-adjoint 

operator Tp are given by 


22 



where (A„)neN are the eigenvalues of the positive trace class operator Hy- 
Thus, Ty is also trace class and satishes {Tyu,u) < ||u|p for any u £ H. 
Then, the assertion follows by Theorem 22 and 

Ty{I - Ty)-^ = (/-(/ + Hy)-^) (/ + Hy) = Hy 


as well as 

{Hy = {Tv, v) Vu G H. 

To show the equivalence of Po{u, ■) and Py{u, •) for any u £ H we introduce 
the auxiliary kernel Ky{u,-) = N{Ayu, s'^C). The first assertion and a 
simple change of variables, see Lemma 23 in the appendix, lead to 

(^) = '^r(-[v- , u,v£n. 

dPr(w) Ivs J 

Thus, it remains to prove the equivalence of Ky{u,-) and Po{u,-) for any 
u £ H. By the Cameron-Martin formula, see Theorem 21 in Appendix A, 
this holds iff 

Im(Ar - \/l - s2/) c Im(C'^/2) 

which was shown in Lemma 12. Now Theorem 21 combined with a change 
of variables, see Lemma 23, then yields 


= TTCM ^[\/l - s^I - Ay]u, - Aru)^ 
and the assertion follows by 


dPoju) 

dPY{u) 


(v) 


dPo(u) dKY(u) 
dKY(u) dPY(u) 


(v). 


□ 


Note that Theorem 13 implies that for any ri,r2 G jC+(H) there exists 
a density between the two gpCN proposals Pyi(u) and Py 2 {u). However, for 
the application of Theorem 8 we still have to verify condition (21). This is 
partly addressed in the following result. 


Theorem 14 (Integrability of gpCN density). Let the assumptions of Lemma 
12 be satisfied and set 

, . dPo(u) . . 

pr[u,v) :=——^{v), u,v£n. 
dPY{u) 

Then, for any 0 < p < 1 + 2\\Hr\\ exist constants c = c(p, Hy) < oo 

and b = b{p, ||C'“^/^Ar||) < oo such that 



u, v) Py{u, dv) < c exp 
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Proof. We employ the same notation as in Theorem 13, i.e., let /Uq = -^(Oj C) 
and Hr = -^(0, Cr) as well as Trr and ttcm be as in (23) and (25), respectively. 
By Theorem 13 we know 

pr{u,v) = TTcM^Arii, ^(v - Aru)J 

By first applying a change of variables, see Lemma 23, and then the Cauchy- 
Schwarz inequality we obtain 


/ p^iu,v) Pr{u,dv) = / (Apu, u) vr^(u) /xr(du) 

In Jn 


= / ^cM(^r«,^^) vr^ ^{v)Ho{dv) 

Jn 

1/2 


< 


’^CM(^^^^,^')Mo(d^') 


'n 


'n 


TTp^ '^{v)Ho{dv) 


1/2 


Furthermore, we have by applying (35) from Appendix A 

I ,^jArU,v)m{dv)=f 

J 'hi J 'hi 


= exp 


{2p^ -p)\\C-^/‘^/\tu\ 


We apply HC ^/^Aru|| < HC ^/^Ar|| ||u|| and set 

b := {2p^-p) ||C'-^/2^r||. 

Note, that 6 < 0 for p < J. Due to the assumptions on p we have 


{{2p-2)Hrv,v) < < ||uf, 


vGH. 


Thus, we can apply (36) from Appendix A and get 

f 2P-2, , ,, , f e^p{J{{2p-2)HrC-y^v,C-y^v)) 

/ V (^')^o(d^') = / - 

Jn Jn 


/io(du) 


det(/ + Fr)(2p-2)/2 

= (det(/ - {2p - 2 )Hy) det(/ + 


=: c2. 


Since Hr is positive and trace class, det(/+Lfr) is well-defined (see Appendix 
A) and det(/-|-iLr) £ [l^co). Furthermore, due to {{2p — 2)Hrv, v) < ||u|p, 
the eigenvalues of {2p — 2)P[r lie within [0,1) which ensures that det(/ — 
{2p — 2 )Hy) > 0 and, hence 0 < < oo. This proves the assertion. □ 
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Thus, the above theorem allows us to estimate the integral in (21). We 
obtain for 0 < p < 1 + l/(2||//r||) that 


/ / pT{u;vYPr{u,dv) fi{du) < c 

J A Ja<= 



Unfortunately, if we divide the right-hand side by p{A) and take the supre- 
mum over all {A : 0 < p{A) < 0.5} this is unbounded. In the next section 
we introduce restrictions of the target measure for which we can circumvent 
this problem. 


4.4 Restrictions of the target measure 

In order to show boundedness of Kp from (21) for the gpCN proposal we con¬ 
sider restrictions of the target measure to bounded sets. For appropriately 
chosen sets, the restricted measures become arbitrarily close to the target 
measure. Let R £ (0, oo] and set 

Rr := {u £R: ||tt|| < i?}. 

Definition 15 (Restricted measure). Let // be a probability measure on 
(R, 13{R)) and R £ (0, oo]. We define its restriction to Rr as the probability 
measure pR on R given by 

PRidu) := -j^j^^lnj^{u)p{du). (26) 

For sufficiently large R the measure pR is close to p, because 

IImr - /^lltv = [ ^(w) - 1 dp{u) = p{Rr) + 1 - p{Rr) = 2p{R%) 

Jn < 1 /^ 

and since is a probability measure on (R,B(R)) there exists for any e > 0 
a number R > 0 such that 2p(R'^) < e. Let us mention here that restricted 
measures appear, for example, also in [3, Equation (3.5)] and in the recent 
work [15], in order to analyze the convergence of Metropolis-Hastings based 
algorithms. 

We ask now whether good convergence properties of a //-reversible tran¬ 
sition kernel K are inherited on a suitably modified ///j-reversible transition 
kernel Kr. 

Definition 16 (Restricted transition kernel). Let iF be a transition kernel 
on R and R £ (0,oo]. We define its restriction to Rr as the following 
transition kernel Kr: R x B{R) -£ [0,1] given by 

Kr{u, du) := K{u, du) -b K{u, R^p) 6u{dv). (27) 
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Note that if K is ^-reversible, then Kr is ^/j-reversible and if K is of 
Metropolis form (4), then so is Kr. 

Proposition 17. Let ^ be a probability measure on {'H,B{T-L)) and iL be a 
/i-reversible transition kernel. Then for any R > 0 the transition kernel Kr 
given in (27) is //i?-reversible with fin as in (26). Moreover, for a Metropolis 
kernel M of the form (4) the corresponding restricted kernel M/j is again a 
Metropolis kernel 


Mp>{u, du) = aii{u, v)P{u, du) + (5u(du) ^ — J aR{u, w)P{u, dtc) 

with aR{u,v) := l'R^{v)a{u,v). 

Proof. Recall that K is ^-reversible iff 

[ K{u,B)dn{u) = [ K{u,A)dn{u), yA,B£B(n). 

Ja Jb 

Let A, B ^ B{'H). We have 

[ KR{u,B)dnR{u) = [ K{u,B ri'HR)dnR{u) + [ K{u,'Hr) d^iR{u) 
Ja Ja Jahb 

=f K{u,BnnR)dn{u)+ [ K{u,nR)dnR{u). 
JAnHn JAr\B 

Because of the ;U-reversibility of K we can interchange A and B which leads 
to the first assertion. The second statement follows by 


MR{u,dv) = lRj^{v)M{u,dv) + 6u{dv)M{u,'HR) 
= lnR{v)a{u,v)P{u,dv) 


-|-(5tt(du) 1 — / a{u,w)P{u,dw) + / a{u,w)P{u,dw) 

\ Jn JH‘h 


= lR^{v)a{u,v)P{u,dv) + 5u{dv) [l — / a{u,w)P{u,dw) 

JHr 


□ 


Now we ask whether a spectral gap of K on L 2 {fi) implies a spectral gap 
of the Markov operator associated with Kr on L 2 {hr). Note that 

KrUu) = [ f{v)KR{u,dv)=[ f{v) K{u,dv) + f{u) K{u,'Hr). 

Jn JHr 

We have the following relation between and ||7L||^. 
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Lemma 18. With the notation and assumptions from above holds 


WKrW^j, < \\K\\^ + snp K{u,n'k). ( 28 ) 

u&'Hr 

Furthermore, if the Markov operator K is positive on L2{p), then Kr is also 
positive on L2{pr). 

Proof. For / G 12(1.1 r) let 


{Ef){u) := lHn(u)f(u) G L2(p). 

Note that ||/||2,/iH = \\Ef\\2,^l andfor f d^R = 0 followsFi/d^ 

0 . Further, for any / G L2(pr) we have 


\\KRf\\ 


2 


/ 

[ f(v)K(u,dv) + f(u)K(u,nR) 

2 

<Jrr(u) 

Jrr 

Jrr 

1 

2 

U 

/ Efiv)K{u,dv)+Ef{u)K(u,n‘R)\ 
Jr 

dRR{u 

\\KiEf) + gEf\\l^^ 




with 5r(u) := 1 r^(u) K(u,'Hff}. Then 


ll/lb.Mfl 


< 

< 


\\Efh,, 


\\K{Ef)+gEf\\2^^^ _ \\E(K{Ef)) + gEf\\2,, 

\\Ef\hrn 

\\K(Ef)\\2,, + \\gEf\\2,, 

\\Efh,^ 

\\K(Ef)\\2,^. , 

II II I- sup K(u,PLr) 

\\^J\\ 2 ,fM uGHr 


where we applied ||-E/||2,/i < II/II2,in the first inequality. By taking the 
supremum over all / G and because of E(L2{gR)) E L2(g,) the first 

assertion follows. Moreover, we have for / G L2(pr) that 


{ERfJ)^.R 



ErKu) f{u) gR(du) 


[ ([ fiv) K{u,dv) + f(u) K{u,nR) 

'j 'H K'J'Hfi 

f f (Ef){v)K{u,dv){Ef){u)-^^ 
Jr Jr r{Er) 


+ [ f(u)K(u,'HR)RR{du). 
Jr 


f(u) gR(du) 
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The second term is always positive since /^(u) > 0 for all n G ^ 

and the first term coincides with {K{E f), Ef)^ . Thus, the second 

statement is proven. □ 

Lemma 18 tells us that there exists an absolute spectral gap of Kr if there 
exists an absolute spectral gap of K and K{u, 71 is sufficiently 

small. Indeed, we can apply this result to the pCN Metropolis algorithm. 

Theorem 19 (Spectral gap of restricted pCN Metropolis). Let ^ be as in 
(1) and let Mq denote the ^-reversible pCN Metropolis kernel. If there exists 
a spectral gap of Mq in L 2 (m), then for any e > 0 there exists a number 
R G (0, oo) such that Mq,/? possesses a spectral gap in L 2 {fiR), i.e., 

gap(Mo,R) = I - > gap(Mo) - £, 

where ij,r as in (26) and Mq^r according to Definition 16. 

Proof. Given the results of Proposition 17 and Lemma 18 it suffices to prove 
that for any e > 0 there exists an i? > 0 such that sup„g-^^ Mo{u, Tiff) < £• 
We recall that the proposal kernel of Mq is Pq{u, •) = A^(Vl — s^C) and 
obtain with ;= A^(0,s^C') that 


sup Mo(u,?7^) < sup Po{u,'H'^) = sup / d/r®(u) 

u^'Hpi u^'Hpi u^'Hpi J\\\/l—s‘^u-\-v\\>R 


< sup / d/i^(u) 

uGRji J ||vT^T2ti|| + ||D||>_R 

= sup / d/i^(u) 

uaRji J ||t>||>R—\/i—s^lDll 

< / = fioinjij 

J\\v\\>{l-VI^)R 


where Rg = ^ R and //q = N{0,C). Again, since //q is a probability 

measure on R. we know that there exists a number R, such that ) < 

e. " □ 


4.5 Spectral gap of restricted gpCN Metropolis 

Now, we are able to formulate and to prove our main convergence result. 

Theorem 20 (Convergence of restricted gpCN Metropolis). Let n be as in 
(1) and assume that the pCN Metropolis kernel possesses a spectral gap in 
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L 2 (/u), i.e., gap(Mo) > 0. Then, for any T G Cj^iJ-L) and any e G (0, gap(Mo)) 
there exists a number Rq = Ro{£) G (0, oo) such that for any R> Rq holds 

\\h-Hr\\^^<£ and gap(Mr,K) > 0 

where gap(Mr,/j) = 1 - ||Mr,_R||^^ denotes the spectral gap of Mr^R in 
L2{fJ'R)- 

Proof. By Theorem 19 we have that for any e G (0, gap(Mo)) there exists a 
number Rq G (0, oo) such that for any R> Rq holds 

IIm - Mfllltv ^ ^ and gap(Mo,ij) > 0. 


Moreover, Proposition 17, Theorem 19 and Theorem 11 yield that for any 
r G Cj^{R) the Markov operator associated to My^r is self-adjoint and 
positive on L 2 {^ir). In particular, My^r is again a Metropolis kernel with 
proposal Py and acceptance probability ur. Thus, in order to apply The¬ 
orem 8 to Mq^r and My^r it remains to verify that there exists a p > 1 so 
that 

Ia v)P PYiu, du) dfiR{u) 

Kp^R := sup --TT- < oo 


where py{u,v) = By Theorem 14 we have for any p < 1 -|- 2\\Hr\\ 

that 


Kp,R < sup 

Mfl(A)e(0,l/2] 


J^cexp (I ||«||2) dpR{u) 




< cexp 



< oo. 


Hence, Theorem 8 leads to 

-(Mr,^ > 0 

which proves the assertion. 


□ 


Theorem 20 tells us that the corresponding restricted gpCN Metropolis 
converges exponentially fast to any, arbitrarily close, restriction pp of p 
whenever the pCN Metropolis has a spectral gap, e.g., under the conditions 
of [13, Theorem 2.14]. In particular, Theorem 20 is a statement about 
the inheritance of geometric convergence from the pCN to the restricted 
gpCN Metropolis. We emphasize that a quantitative comparison of their 
spectral gaps is not proven. We provide a lower bound for the spectral gap 
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of gap(Mr,/j) in nonlinear terms of the spectral gap of the pCN Metropolis. 
Additionally, the stated estimate behaves rather poor in R, more precise, it 
decays exponentially as i? —)■ oo. 

Although we argued in the above theorem with restrictions of n in order 
to bound Kp from Theorem 8, let us mention that, in simulations when R is 
sufficiently large one cannot distinguish between n and fiR as well as between 
Markov chains with transition kernels Mr and Mp,/?. 

Moreover, we conjecture that the gpCN Metropolis targeting fi has a 
strictly positive spectral gap whenever the pCN Metropolis has one. Re¬ 
calling the results of the numerical simulations in Section 3.3 we even con¬ 
jecture that the spectral gap of the gpCN Metropolis with suitably chosen 
r G C^{'H) is much larger than the one of the pCN Metropolis. 

5 Outlook on gpCN proposals with state-dependent 
covariances 

In this section we comment on state-dependent proposal covariances as they 
are a natural extension of the idea behind the gpCN proposal. The advan¬ 
tage of such a state-dependent approach is that the resulting Metropolis 
algorithm might be even better adapted to the target measure by allowing 
locally different proposal covariances. For an illustrative motivation of state- 
dependent proposal covariances we refer to [11],[22] and for recent positive 
and negative theoretical results we refer to [20]. In the Hilbert space setting 
we are now able to define MH algorithms by means of Theorem 13. Consider 
the proposal kernel 


Pioc{u, •) = N{Ar(u)U, s‘^Cr(u)) (29) 

where we assume that ioi u £ R we have r(ii) G Cj^{T-L) and that the corre¬ 
sponding mapping T{u) is measurable. Further, by Ay(^u) and C'r(M) we 
denote the components of the gpCN proposal for F = F(tt). Following the 
heuristic presented in Section 3.1 for Bayesian inference problems where 
in (1) is of the form (10), we could chose for instance 

r(u) = VG(u)*S-iVG(u). (30) 

When considering the measure ? 7 ioc(dri, du) = Pioc{u,dv)no{du) we notice 
that r/ioc is no longer a Gaussian measure due to the dependence of F on 
u. However, to construct a ;U-reversible Metropolis kernel with the proposal 
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Pioc above, we can apply the same trick as in [1, Theorem 4.1], Namely, 
with p-p{u,v) = dP°(M) (^) given in Theorem 13 we obtain 


Pioc(it,dn)/io(dn) 


- - -- Po{u, dn)/io(dn) 

Pv(u){u,v) 


1 

PV(u){u,v) 

PV(v){v,u) 

PV(u){u,v) 


Po{v,du)fj,o{dv) 

Pioc{v,du)j2o{dv), 


where we used the ^o-reversibility of the pCN proposal Pq. Hence, according 
to the general Metropolis kernel construction outlined in Section 2.2, we have 
that a Metropolis kernel Mioc with proposal Pioc and acceptance probability 


aioc{u,v) = min |l,exp(4>(tt) - 4>(n)) ( 31 ) 

I Pr{v){v,u)} 

is /r-reversible. Note, that the same construction can analogously be applied 
to proposals of the form 


PUu, •) = ^(v'l - S^u, s^Cr^u)), 
where the modified acceptance probability is then given by 

aiociu, v) = mm 1, exp($(n) - 4>(n)) --y--—— \ 


(32) 


(33) 


with TTr as stated in Theorem 13. The arguments above show that this type 
of algorithms are well-posed in infinite dimensions. Of course, the question 
arises if the additional computational costs of evaluating r(tt) and pr(u) 
or 7rr(tt) in each step pay off in a significantly higher statistical efficiency. 
Related to this concern, one could think of substituting VG{u) in (30) by 
a cheaper approximation in order to reduce the computational work. This 
might help to make MH algorithms with local proposal covariances feasible. 
Unfortunately, the tools and results developed and presented in Section 4 
are not sufficient to prove spectral gaps of these MH algorithms with state- 
dependent proposals. The main reason for this is the missing reversibility 
of the proposals w.r.t. //q- This condition played a key role in Theorem 8 
and is the main reason why the analysis of Section 4 is not applicable. We 
leave this open for future research. 
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Appendix 

A Gaussian measures 

The following brief introduction to Gaussian measures is based on the pre¬ 
sentations given in [6, Section 1] and [12, Section 3]. Another comprehensive 
reference for this topic is [2]. 

Let ^ be a Hilbert space with norm || • || and inner-product (•, •) and let 
C\{T-L) denote the set of all linear, bounded, self-adjoint, positive and trace 
class operators A : T-L ^ T-L. 

Let ;U be a measure on and for simplicity let us assume that 

||u|p li(du) < oo. The mean m £ Ti oi /ais defined as the Bochner integral 
w- = and the covariance of fj, is the unique operator C £ 

given by 

{Cu,u) = / {u,v — m){u\v — m)^{dv), \/u,u £%. 

Jn 

A measure /U on is called a Gaussian measure with mean m £ % and 
covariance operator C £ C\{T-L), denoted by N{m,C), iff 

[ e'<“’’^>M(du) = yu£n. 

Jn 

This definition is equivalent to {u)^n = N{{u,m), {Cu,u)) for all u £ Ti 
where (u) : ^ M with {u){v) := {u,v) and where denotes the 

pushforward measure of jj, under the mapping (u). Gaussian measures are 
uniquely determined by their mean and covariance, i.e., for any m £ % 
and any C £ C\{T-L) there exists a unique Gaussian measure fi = N{m,C) 
on %. Moreover, the set of random variables on % distributed according 
to a Gaussian measure is closed w.r.t. affine transformations. In detail, 
let X ~ N(m, C) be a Gaussian randon variable on % and let 6 G ^ and 
T: "R —"R be a bounded, linear operator, then due to [6, Proposition 1.2.3] 
we have 

b + TX ^ N{b + Tm,TCT*). (34) 
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The Cameron-Martin space T-L^ of a Gaussian measure /x = N{m, C) 
on T-L is defined as the image space ImC^/^ which forms equipped with 
{u,v)q-i := again a Hilbert space. The space T-L^ has 

some surprising properties: it is the intersection of all measurable linear 
subspaces X (1% with p,{X) = 1; if ker C* = {0} then is dense in T-L and 
if % is inhnite dimensional then piiT-Lp) = 0. Moreover, the space T-L^ plays 
an important role for the equivalence of Gaussian measures as rigorously 
expressed in the Cameron-Martin theorem below. Before stating the result 
we need some more notation. 

In the following let fj, = N{0,C). For u £ we set 

Wu{v) := {C~^^'^u,v), Vu G Ti, 

and understand Wu as an element of T 2 (/u). Since the mapping 3 u 
Wu G is an isometry [6, Section 1.2.4], we can define for any u ^T-L 

•) := L 2 {h)- lim IF„„ 

n^oo 

where Un G and Un ^ u inT-L as n ^ oo. And by [6, Proposition 1.2.7] 
it holds that 

/ ;u(du) =e5ll“ll", Vw G (35) 

Jn 

Hence, if /i G we understand {C~^h, •) as •) G T 2 (/u). 

Theorem 21 (Gameron-Martin formula, [6, Theorem 1.3.6]). Let n = 
A^(0,C') and Hh = N{h,C) be Gaussian measures on a separable Hilbert 
space 7i. Then, fj, and p.h are equivalent iff h G = ImG^/^ in which case 

^(v) = exp (^-illC-V2/,||2 + . 

Thus, two Gaussian measures N{m, C) and N{m + h, C) are only equiv¬ 
alent if /i G ImG^/^. Gonsider now jx = A^(0, C) and v = N{Q^Q) with 
C ^ Q. Before stating a theorem about the equivalence of fx and ix, we need 
some more notations. Let T : Ti ^ T-Lhe in the following a self-adjoint trace 
class operator and let (fn)neN denote the sequence of its eigenvalues. We set 

OO 

det(/ -|- T) := (1 -|- t„) 

n=l 


33 



and define 


{TC-^/‘^u,C-^>^u) := lim 


/x-a.e. 


where IIjv denotes the projection operator to spanjei ,... ,6^} with de¬ 
noting the nth eigenvector of C. The existence of the ^u-a.e.-limit above is 
proven in [ 6 , Proposition 1.2.10] and, furthermore, if {Tu,u) < ||n|p holds 
for any u gH, then by [ 6 , Proposition 1.2.11] we have 


Jn 


v/det(l-r)' 


(36) 


Theorem 22 ([ 6 , Proposition 1.3.11]). Let jj, = N{0, C) and n = A^(0, Q) be 
Gaussian measures on a separable Hilbert space %. If T ;= I 
is self-adjoint, trace class and satisfies {Tu,u) < ||n|p for any u gT-L, then 
/i and u are equivalent with 


dn 

d^ 


(n) 


Vdet(/ - T) 


exp 




u 


We note that the assumptions of Theorem 22 can be relaxed to / — 
being Hilbert-Schmidt which is known as Feldman-Hajek the¬ 
orem. Also in this case expression for the Radon-Nikodym derivative can 
be obtained, see [2, Corollary 6.4.11]. 

Finally, we recall two simple but useful facts resulting from a change of 
variables. 


Lemma 23. Let ^ be a separable Hilbert space, 0 < s < oo and h £ Ti. 

• Assume /x = N{m, C), u = N{m + h, s'^C) on Ti and / : ^ > M. Then 

/(x;)/x(dx;) = ^ / Q(x; - h)^ i^{dv). 

• Assume = N{mi,Ci) and fi 2 = A'(m 2 ,C' 2 ) are equivalent with 
]|j)^(xi) = 7 r(n). Then the measures vi = N{mi -|- h,s^Ci) and V 2 = 
N{m 2 + h, 5 ^ 6 * 2 ) are also equivalent with 


di^2 

dx^i 


(xi) 


= TT 


u — h 
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B Proofs 


The following proofs are rather operator theoretic and rely heavily on the 
holomorphic functional calculus. We refer to [ 8 , Section VII.3] for a com¬ 
prehensive introduction. 


B.l Proof of Lemma 2 

From the proof of Proposition 1 we know that {I + Hr)~^ : Ti ^ Ti is self- 
adjoint and that ||(/ -|- FIr)~^|| < 1- Thus, I — s^{I -|- Hy)~^ is also a self- 
adjoint, bounded and positive operator on % and its square root operator 
appearing in (16) exists. This yields the well-definedness of • ImC*^/^ —)• 
%. We now prove that ylr is a bounded operator on ImC^/^. For s = 0 we 
get Ay = I and the assertion follows, so that we assume s G (0,1). Let us 
now define / :C\{ —1}—^Cby 

f{z) = ^J\ - s2(l -h 2 :)-L 


The function / is analytic in the complex half plane {z G C : — I}, 

since 3^(1 -|- z) > implies 


3f?((l + z)-i) 


1 1 

|1 -I- zP “ 3^(1 + z) s^' 


Denoting 7 := ||iLr|| the spectrum of ffp = is contained in [ 0 , 7 ]. 

Then, since s < 1 we have that / is analytic in a neighborhood, say, AA[0, 7 ] 
of [ 0 , 7 ]. Hence, by functional calculus we obtain 


^/_s2 (i + Fr)-' = f(ffr) = ^ / /(C) (a - ffr)-^ dC. 

JaAr[o,7] 

Due to analyticity we can approximate / by a sequence of polynomials pn 
with degree n which converge uniformly on AC[ 0 , 7 ] to / for n —)• 00 . Then, 
by [ 8 , Lemma VIL3.13] holds 


\\Pn{Hr) - f{Hr)\\n^n 0 , 


for n —)• 00 . Since the polynomials pn can be represented as Pniz) = 
z^, we obtain further 

n 

C^/’^PniHr) = = pn{CT)C^/'^. 

k=0 
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By [14, Proposition 1] we have 

spec(Cr I n) = spec(Ci/2r(^i/2 I ^ 

where spec(- | T-L) denotes the spectrum on and, thus, we can conclude 
\\pn{CT)—f {CT)\\'}i^'}l —)■ 0 as n —)• oo again by [ 8 , Lemma VII.3.13]. Hence, 

= lim C^/^pniHv) = lim pn{CT)C^/^ = f{CT)C^/^ 

n^oQ n^oo 

and 

Ar = = f{CT)C^/^C-^/^ = /(CT) 

where /(CT) is by construction a bounded operator on T-l. □ 

B.2 Proof of Lemma 12 

By [7, Theorem 1 ] the relation Im(Ar) ^ Im(C^/^) holds iff there exists a 
bounded operator B : T-L ^ T-L such that 

Ar = (37) 

Thus, Im(Ar) ^ Im(C^/^) is equivalent to C“^/^Ar being bounded on TL. 
In order to construct and analyze the operator B, we define / : C\{— 1} — )■ C 
by 

f(z) := a /1 - s 2 (l + z)-^ - \/l - 

which is analytic in {z G C : 1 ?( 2 :) > — 1}, cf. the proof of Lemma 2 , and 

particularly in 

V = {z e C : dist( 2 :, [ 0 , 7 ]) < e}, 0 < e < 1 — 

where 7 := ||17r||- We have the following representation 
—Ar = Ar — \/l — 

= (i + Hr)~^ - Vl - s2/^ 

= ( 7 - 1/2 

with 

mr) = 7^f f(0(CI-Hr)-UC 

JdV 

see [ 8 , Chapter VIL3]. Hence, if we can prove that B = —f(Hr) is a 

bounded operator on TL, we have shown the assertion. 
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For this let Pn{z) = X]fc=o polynomials of degree n, with n G N, 

which converge uniformly on F to /. Such polynomials exist due to the 
analyticity of / and by the fact that /(O) = 0 we can assume w.l.o.g. that 
ag""^ = 0 for all n G N. This leads to 

r^/2ci/2 

= c^/2rV2 rV2(7i/2 


with qn-i{z) := =Pn{z)jz. Now, [14, Proposition 1] implies 

that the operators and F^/^CF^/^ share the same spectrum, since 

C and F are positive. Thus, spec(F^/^C'F^/^ | H) C [0, 7 ] and we have 

g„(FV2cFV2) = ^ / qniC) {(I - dC, n G N. 

zm Jqv 

Moreover, the polynomials qn are a Cauchy sequence in C{dV), since 


sup |g„(C) - qm{C)\ < sup — 

C&dv Ce9V unn^gay 


rkn(C) - qm{C)\ 


1 


I I 

mm,,ggy I7I ^ggy 
1 


sup |C<?n(C) - Cqm{C)\ 


I I ^ 

milir/^dV \v\ C^dV 


sup |Pn+l(C) -Pm+l(C))l 


where min^ggy jryj = e > 0 due to our choice of V. Thus, the polynomials qn 
converge uniformly on dV to a function g. This implies that the operators 
converge in the operator norm to a bounded operator 


g{Y^/2cY^/2) 


We arrive at 


5(C)(a-ri/'c'FV2)-idc. 

27^^ Jqv 


f{Hr) = lim pniC^/^TC^/^) 

n^oo 

= lim g„_i(F^/ 2 ( 7 ri/ 2 ^ ^ 1 / 2 ( 71/2 

n^oo 

= (7i/2ri/2 


which yields 

B = = -C^/2f^/ 2 g(rV 2 CTi/ 2 )pi /2 


being bounded on %. 


□ 
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